Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction
نویسندگان
چکیده
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and extremely high-dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in both the memory usage and computational cost without sacrificing accuracy. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVM. Experiments on both synthetic and real datasets (e.g., the kddb dataset with about 20 million samples and 30 million features) demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.
منابع مشابه
Supplemental Material: Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction
Weizhong Zhang * 1 2 Bin Hong * 1 3 Wei Liu 2 Jieping Ye 3 Deng Cai 1 Xiaofei He 1 Jie Wang 3 State Key Lab of CAD&CG, Zhejiang University, China 2 Tencent AI Lab, Shenzhen, China, 3 University of Michigan, USA In this supplement, we first present the detailed proofs of all the theorems in the main text and then report the rest experiment results which are omitted in the experiment section due ...
متن کاملScaling Up Sparse Support Vector Machine by Simultaneous Feature and Sample Reduction
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and extremely high-dimensional features, solving sparse SVMs remains challengi...
متن کاملTrading Accuracy for Size: Online Small SVMs via Linear Independence in the Feature Space
Support Vector Machines (SVMs) are a machine learning method rooted in statistical learning theory. One of their most interesting characteristics is that the solution achieved during training is sparse, meaning that a few samples are usually considered “important” by the algorithm (the so-called support vectors) and give account of most of the complexity of the classification/regression task. I...
متن کاملSVM Classifier Incorporating Feature Selection Using GA for Spam Detection
The use of SVM (Support Vector Machines) in detecting e-mail as spam or nonspam by incorporating feature selection using GA (Genetic Algorithm) is investigated. An GA approach is adopted to select features that are most favorable to SVM classifier, which is named as GA-SVM. Scaling factor is exploited to measure the relevant coefficients of feature to the classification task and is estimated by...
متن کاملKarhunen-Loeve Transform and Sparse Representation Based Plant Leaf Disease Recognition
To improve the classification accuracy rate of apple leaf disease images and solve the problem of dimension redundancy in feature extraction, Karhunen-Loeve (K-L) transform and sparse representation are applied to apple leaf disease recognition. Firstly 9 color features and 8 texture features of disease leaf images are extracted and taken as feature vectors after dimensionality reduction by the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017